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Abstract. Although the broad outlines of the appropriate pipeline for cosmological likelihood analysis 
with CMB data has been known for several years, only recently have we had to contend with the full, 
large-scale, computationally challenging problem involving both highly-correlated noise and extremely large 
datasets (N > 1000). In this talk we concentrate on the beginning and end of this process. First, we discuss 
estimating the noise covariance from the data itself in a rigorous and unbiased way; this is essentially an iter- 
ated minimum-variance mapmaking approach. We also discuss the unbiased determination of cosmological 
parameters from estimates of the power spectrum or experimental bandpowers. 



In principle, CMB anisotropy data analysis is easy: for at least some subset of theories (i.e., inflation) and 
some subset of CMB experiments, exact likelihood functions can be written down. In these cases, at least, 
the rest of the problem is implementation: the numerical calculation of the probability distribution of the 
interesting parameters given the data. 

We model a measurement of the CMB as 



where d t is the data at time t iy T p is the beam-smeared temperature at pixel p (pointed at by unit vector x p ), 
Ai p is the pointing operator such that Ai P = 1 when pixel p is observed at time i, and rii is the noise at time 
i. The noise is characterised by its distribution, for our purposes assumed to be a Gaussian with covariance 

{riin v ) = Nu> 

Given the data, then, we first wish to solve for the sky temperature T p and its posterior distribution, and 
then use this map to determine the statistical properties of the CMB sky, such as its power spectrum, Ce and 
its distribution, from which we will finally determine the underlying cosmological parameters. 

Unfortunately, this implementation is indeed complicated in realistic cases. It is usually assumed that 
the error distribution for an experiment is somehow handed down to the theorists and data analysts by the 
experimenters, themselves with a direct link to some higher being who decides these things. Of course this 
is not actually the case: instead, the error distribution itself must be estimated from the data along with the 
signal. In some (not necessarily realistic) cases, this is simple. If it were known a priori that the noise were 
Gaussian and white, then Na> = ct 2 8u' and we could use the histogram of observations at a given pixel to 
estimate the noise variance a. 

In more realistic cases, the noise in the timestream will be correlated and contaminated by systematic 
problems and glitches such as cosmic rays, making any simple procedure like this infeasible in practice. In 
this paper we outline a self-consistent Bayesian (likelihood) method for determining the noise power spectrum 
along with the CMB map. 



I INTRODUCTION 
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Given a map and its statistical distribution (i.e., the error matrix for Gaussian noise), determining the 
power spectrum (for the case of a Gaussian CMB signal, as well) has been well-studied; problems remain in 
the implementation of efficient algorithms, but the ideal solution is known. 

In this paper, we also address the question of what to do next: how do we go from an estimate of the 
CMB fluctuation power spectrum, Ce, to the underlying cosmological parameters? Because of the complicated 
(non-Gaussian) nature of the likelihood function, we will show that simple \ 2 techniques will necessarily bias 
the determination of the parameters, but will derive an ansatz for eliminating this bias. 

The plan of this paper is as follows. First, we discuss the CMB Data- Analysis Pipeline from the standpoint of 
likelihood functions. Then we address the specific problem of simultaneously determining the noise and signal 
from the timestream. Finally, we address the problem of estimating parameters from the power spectrum. 

II LIKELIHOODS FOR CMB EXPERIMENTS 

As has become customary, we start our analysis with Bayes' theorem 

P(9\DI) oc P(9\I)P(D\9I) (2) 

where 9 are the parameters we are trying to determine, D is the data, and / is the "background information" 
describing the problem. The quantity P{D\9I) is thus the likelihood, the probability of the data given a specific 
set of parameters, P(0\I) is the prior probability for the parameters, and the left-hand side of the equation is 
the posterior probability for the parameters given the data. 

As above and throughout, we will take the data, dt, as given by a time series of CMB measurements, 

d% = S{ -\- Tii = ^ ^ AipTp -)- Uj, (3) 
p 

where i labels the time, t = iSt, Sj and are the experimental noise and sky signal contributions at that time. 
The signal is in turn given by the operation of a "pointing matrix," A ip on the sky signal at pixel p, T p (i.e., 
the "map"); we take the latter to be already pixelized and smeared by the experimental beam, so A is a very 
sparse matrix with a single "1" entry for each time corresponding to the observed pixel. In the future we will 
often rely on the summation convention and write ^2 p Ai P T p = Ai P T p , or occasionally use matrix notation, as 
in AT, etc. 

We will assume that the observed noise rn is a realization of a stationary Gaussian process with power 
spectrum N(uj). This means that the correlation matrix of the noise is given by 

(mm,) = Nw = j ^ N(uj)e-^- l 'K (4) 

The stationarity of the process requires (or is defined by) Nui — N(ti — t'A. 
Most generally, we will take the parameters to be 

• The observed CMB signal on the sky, T p ; 

• The power spectrum of the noise, N(w) 

• (Possibly) any cosmological parameters which describe the distribution of the T p (i.e., the CMB power 
spectrum, Ce, although we could also use the cosmological paramters such as Ho and fl directly). 

Sometimes, we will assign a prior distribution for the CMB signal such that the cosmological parameters will 
be irrelevent; other times we will marginalize over the CMB signal itself and determine those parameters. 
With these parameters and the data, di, Bayes' theorem becomes 

P[T P , N, Ce\di,I] oc P[N\I] P(T P , C t \I) P[di\N{u),T p , I] . (5) 

Here, we have used two pieces of information to simplify slightly. First, the noise power spectrum, N(ui) 
does not depend at all on the signal, so we can separate out its prior distribution. Second, given the noise 
power spectrum and the sky signal, the likelihood does not depend upon the cosmological parameters. For the 
Gaussian noise we assume, the likelihood is simply 
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-21n£ = -2ln P[di\N,T p ] 

= In|JV«i| +5^(dj - fliJJV^^dj, - s»0 = ^ [lniV fe + \d k - ~s k \ 2 /N k 



(6) 



(ignoring an additive constant); recall that Sj = ^ p A ip T p . The second equality uses tildes to denote the 
discrete Fourier transform at angular frequency io k (see appendix and/or below. . . ). 
We will now apply these general formulae to various cases. 



A Known noise power spectrum 

We will start with the simplest case, where we have complete prior knowledge of the noise power spectrum. 
This is the case that has been previously discussed in the literature, but we emphasize that it is very unrealistic! 

We assign a delta-function prior distribution to N, transforming it in effect from a parameter to part of 
the prior knowledge. First, we assume no cosmological information about the distribution of temperatures 
on the sky: P(T p ,Ce\I) = P(T p \I)P(Ce\I); with this separation the posterior for Ce is simply the prior — the 
experiment gives us no new information. We will also assign a uniform prior to T p , lacking further information. 
Now, the posterior distribution for the sky temperature is simply proportional to the likelihood, which can be 
rewritten by completing the square as 

P(T p \N,d,I) cx P(d\T p ,N,I) = 1 — T j^exp 

with the mean (also, the likelihood maximum) given by 

T=(A T N- 1 A)~ 1 A T N- 1 d (8) 
(in matrix notation) , and the noise correlation matrix by 

C N = (A T N- 1 A)- 1 . (9) 

Occasionally, the inverse of this correlation matrix is referred to as the weight matrix. As is usual for linear 
Gaussian models, the mean is just the multidimensional least-squares solution to d = AT with noise correlation 
N. This is just the mapmaking procedure advocated in Refs. [1,2], cast into the form of a Bayesian parameter- 
estimation problem. 

For the case of known noise, however, this map is more than a just a visual representation of the data. Even 
if we wish to determine the cosmological parameters, it is an essential quantity. We can write the prior for 
both the map and the spectrum as 

P(C e ,T p \I) = P(T p \Ce,I)P(C e \I) (10) 

using the laws of probability, and so we can see that our rewriting of the likelihood in the form of Eq. 7 remains 
useful. That is, the full distribution is only a function of the data through the maximum-likelihood map, T — in 
statistical parlance, T is a sufficient statistic. Thus for known noise, we can always start by making a map 
(and calculating its noise matrix, Cjv)- 



2 

pp' 



Cn,pp' ( t p' 



Tp') 



(7) 



1 Cosmological CMB priors 

Here, we briefly examine the specific form of the signal prior, P(T p \Ce, I), motivated by simple Gaussian 
models. That is, we take the sky temperature, T p , to be an actual realization of a Gaussian CMB sky, with 
covariance specified by the power spectrum, Ce, 

(T P T P ,) = C T , PP , = ^^-C e BjP e (x p ■ v) (11) 
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(note that we include beam-smearing by a symmetric beam with spherical harmonic transform Bg in this 
definition); x p ■ x p > gives the cosine of the angle between the pixels. With this covariance, the prior becomes 



P(T p \C e ,I) = 



2irG 



Tpp> 



,1/2 



exp 



9 yi^^T.ii)'^ 1 



pp' 



(12) 



We thus have a posterior distribution for T p and Cg which is the product of two Gaussians. In the usual 
cosmological likelihood problem, we don't care about the actual sky temperature per se, but are concerned 
with the Cg (or the parameters upon which the power spectrum depends). Thus, we can marginalize over the 



P(C t \d,I) = J dT p P(Cg,T p \d,I) = P(Cg\I) J dT p P(T p \Cg,I)P(d\T p ,I) = P(Cg\I)P(f(d)\C e I) 



= P{C t \I) 



\2ttC t \ 1/2 \2itC n \ 1/2 



exp 



pp' 



N ,pp' 



(13) 



where we have incuded a prior for the power spectrum itself, so we can write the Gaussian factor as the 
likelihood for the map given Cg, P(Cg\d,I) oc P(Cg\T)P(f\Cg). The equation defines the effective likelihood 
for the map (T, now considered as the data rather than the timestream itself, d), which is easily computed 
again by completing the square, giving 



P(P P \CgI) 



\2tt (Ctpp 1 + Cn P p' 



1 1/2 



exp 



pp' 



(14) 



This is just the usual CMB likelihood formula: the "observed map," T p , is just the sum of two quantities (noise 
and signal) distributed as independent gaussians. Note again that the data only enters through the maximum 
likelihood map, T, although that calculation is only implicit in this formula. Further, the power spectrum Cg 
only enters through the signal correlation matrix, Ct, and in a very nonlinear way. 

We can also play a slightly different game with the likelihood. If we retain the gaussian prior for the CMB 
temperature but fix the CMB power spectrum, we can estimate the map with this additional prior knowledge. 
We will again be able to complete the square in the exponential and see that T p is distributed as a gaussian: 



P(T p \C e ,d,I) 



■ exp 



- X 2 (T p \Cg,d,I) 



with 



1 2*Cw 1 1/2 

X 2 (T P \C e , d, J) = ^ (f p (Wf) p ) C w ] pp , (T p , - (WT) P ,) . 
pp' 

Now, the mean is given by 

(Wf) = C T (C T + Cjv)" 1 ? 
which is just the Wiener Filter*. It has correlation matrix given by 

C\y — Ct(Ct + Cjv) 1 CV- 



(15) 



(16) 



(17) 



(18) 



Note that the maximum-likelihood map, T still appears in these formulae, but it is no longer the maximum of 
the posterior distribution, now given by the Wiener filter, WT. 

This subsection has shown how many of the usual CMB data calculations can be seen as different uses of 
the Bayesian formalism: 

• the least-squares map (seeing it as a "sufficient statistic" ) , 

• the CMB cosmological-parameter likelihood function, and 

• the Wiener filter map of the CMB signal. 

The differences depend on what quantity is estimated (the map or the power spectrum) and what prior 
information is included. 
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III SIMULTANEOUS ESTIMATION OF THE NOISE AND SKY MAP 

We now turn to the (realistic!) case where the noise covariance of the experiment are not known prior to 
receiving the data. That is, we must use the data itself to estimate the noise properties. 

The previous subsection briefly outlined the Bayesian approach to CMB statistics, assuming a known noise 
power spectrum. Now, we will relax this assumption and approach the more realistic case when we must 
estimate both the experimental noise and the anisotropy of the CMB. We will take as our model a noise power 
spectrum of amplitude N a in bands numbered a, with a shape in each band given by a fixed function Pk with 
a width of n a (the number of discrete modes in the band); we will usually take Pk = const so N is piecewise 
constant. That is, 



N(tu k ) = N a P k . 



(19) 



We again assign a constant prior to the sky map, T p . As the prior for the noise we will take P(N a ) oc 1/-/V^ 
With v = 1 and a single band, this is the usual Jeffereys prior advocated for "scale parameters" and the units 
on N a are irrelevent. 

With this model and priors, the joint likelihood for the noise and the map becomes 
P(T p , N a \d, I) oc Yl . v l na/2 cxp -^L- Y k dk- A kp T p 



N' 



n 



v+n a /2 



N' 



cxp 



\e k \ 2 



(20) 



where k £ a refers to a sum over modes in band a. We also define the estimate of the noise as e = d — AT for 
future use. 

We can simultaneously solve for the maximum-probability noise and signal. Carrying out the derivatives 
gives 



dlnC 
dT v 



= E 



-TT E p" (dk - Akp'Tp,). 



N <* Pk 



*-kp 



= {d- AT) T N~ X A = e T N- 1 A 
(switching between indices in Fourier space and matrix notation) and 



(21) 



dlnC 



1 1 



(n Q + 1v) 



dk Akp'T p / 



Setting these to zero and solving, we then find me must simultaneously satisfy 

T= (A T N- 1 Ay 1 A T N- 1 d 
(using matrix notation for simplicity) and 

1 1 3 7 m 2 1 V- 1 



n a + 2v f-^ P k 



n a + 2v fr* Pk 



- ,2 



(22) 



(23) 



(24) 



Equation 23 is just the usual maximum-likelihood map solution; Equation 24 is the average "periodogram" 
of the noise over the band a, with a slight modification for the prior probability, parameterized by v; for 
wide bands this is irrelevant. As we will see below, iteration is actually a very efficient way to solve these 
simulataneously. For future reference, wc also write down the derivative with respect to the map at the joint 
maximum (i.e., substituting Eq. 24 into Eq. 21), 



dlnC 
dT v 



E 



(n a + 2v) 



J2kea ZkAkp/Pk 



(25) 
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If we fix the noise at the joint maximum, then the problem reduces to that of the previous section, a Gaussian 
likelihood in T p with which the usual tools can be applied. This is not a rigorous approach to the problem, 
however due to correlation between noise and signal estimation. To get a handle on this, we calculate the 
curvature of the distribution around this joint maximum, which we define as 



T = 



Qpp' Qkp 

Gpk Gkk' 



(26) 



where a subscript a or p refers to a derivative with respect to N a or T p , respectively. T refers to the full 
matrix; Q to the sub-blocks. Explicitly, the parameter derivatives are 



d 2 \nC 



dT p dT p 



AkpAkpi 



= -A T N-'A 



and 



k£a 



k 



d 2 ln£ 

dN a dN a 



_ n a + 2v 

— Vaa' — 2^y2 aa ' 



(27) 



(28) 



with the cross-curvature 



d 2 ln£ 

8T p dN a 



1 &kAkp 

" «6a 



(29) 



The distribution about the joint maximum is not a Gaussian in the N a directions, so there is more information 
available than this. Nonetheless, if we treat the distribution as if it were Gaussian, we can calculate the 
covariance matrix, given by the inverse of this curvature matrix: 



T- 1 



{Qpp' GpaG aa 'Ga' p) (^Gap Gaa f G a 'p/Gp f p 

i^Gpa. ~~ Gpp'Gpi a 'Ga' c^j |^aa' GapGpp' 

In particular, the effective variance of the map is increased from Cat to 

(Qn pp> ~ GpaG aa iGa'p') 



ricff 
^Npp' 



\ ^ 1 \ ^ AkpAkp/ 

a kea 



^E 



^k-A-kp v £fe' Afcfp' 



Pk> 



(30) 



(31) 



which we have evaluated at the simultaneous peak of the distribution to eliminate N a . 

In Ref. [3], we examine this correction to C^ 1 (the weight matrix) more closely; for sufficiently wide bands, 
relative to the number of pixels in the map, the correction is ncgligeable. 



A Noise Marginalization 



Instead of this joint solution, formally, at least, we know the appropriate procedure: marginalize over the 
quantity we don't care about (the noise power spectrum) to obtain the distribution for the quantity we wish 
to know (the map, T p ). We can actually carry out this integral in this case: 



P(T p \d h I) = J dN k P(T p ,N k \d h I) cx JJ 



V — 

^ Pk 

,k£a 



dk A-kpTp 



-{n a /2+u-l) 



(32) 



(this is just Student's t distribution in a slightly different form than is usually seen). If we stay in Fourier 
space, the maximum of this distribution can be calculated to be the solution of 
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d\nP 



= E 



(n a + 2u-2) 



Efeea -k ( d k - A kp ,T p ,)A 



hp 



J2k> 



dk' ^k'p'Tp' 



(33) 



Note that this is exactly the same form as Eq. 25, the equation for the maximum probability map in the joint 
estimation case, with the prefactor (n a + 2is) replaced by [n a + 2v — 2). This is equivalent to changing the 
exponent of the prior probability from v to v — 1: the marginalized maximum for v = 1 (Jefferys prior) is the 
same as the joint maximum for v = (constant prior). We have also seen that for n a 3> 1, the value of v is 
irrelcvcnt, so that these should be nearly equal. (Moreover, the numerical tools to calculate the joint solution, 
outlined below, can be used in this case, too.) Note also that if n a + 2v — 2 < 0, the equation isn't solved for 
any (finite) map. In this case, either our prior information is so unrestrictive or the bands are so narrow that it 
is impossible to distinguish between noise and signal, and the probability distribution has no maximum when 
the noise is marginalized. 

In principle, any further analysis of the map would have to rely on the full distribution of Eq. 32. In 
practice, the t-distribution is quite close to a Gaussian [again, for n a ^> 1 — is this right?] (although the tails 
are suppressed by a power-law rather than an exponential). It will thus be an excellent approximation to take 
the distribution to be a Gaussian 



P{T p \d u I) 



1 



1/2 



exp 



pp' 



(34) 



with noise covariance given by 



{ C< Npp>) 



d 2 In P{T p \di, I) 



dTpdT, 



p' 



E 



(n a + 2v - 2) ^ A k pA kp , (n a + 2v - 2) ^ e k A kp x ^ e k > A k 



E 



Pk 



(E 



\~k 



2\2 



' P k • • 



k' 



Py 



(35) 



where T is the solution to Eq. 33 and the derivatives are taken with respect to the full distribution of Eq. 33). 
Note that the inverse of (the weight matrix) is given again by the sum of two terms. The first is just the 
weight matrix that would be assigned given the maximum probability solution for the noise, Eq. 9 (with the 
numerically irrelevent change of the prior v — > v — 1 as noted above). Again we find a correction term due 
to the fact that we are only able to estimate this noise power spectrum, where the correction has exactly the 
same form as in the non-marginalized case (Eq. 31) above with a change of the prefactor. 

In Ref. [3], we show how the simultaneous solution for noise and map can be performed iteratively. As we 
have shown, this is also the solution for the case when the noise is marginalized over. 



B Discussion 



Happily, the bottom line of this analysis is that the mapmaking phase of CMB data analysis can proceed as 
has been discussed by other authors in the past. We do recommend two slight additions to the pipeline: 

• Iterative map & noise determination 

1. Start by assuming the data stream is all noise. 

2. Calculate the noise power spectrum via FFT. 

3. Inverse FFT, and use this noise matrix for mapmaking 

4. Subtract the estimated signal contribution from the map, and repeat. 

• Calculate the correction to the noise matrix (Eq. 35) (and either use the correction or make sure it is 
ncgligeable) . 

We note that it is only necessary to solve for the map itself during the iterative process, but not to invert 
the inverse noise matrix, C^l. The former can, in principle, be accomplished in far fewer operations than a 
matrix inverse, so the iterative procedure need not add a great amount of computing time to the process. 
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However, it is conventional wisdom that an ideal observational strategy has signal to noise of unity. In this 
regime, the signal to noise in the timestream is very low, and a good aproximation to the estimator of the 
noise power spectrum is the banded periodogram of the data stream itself. Thus one is able to avoid the 
time-consuming process of performing multiple iterations. 



IV RADICAL COMPRESSION: 
FROM THE POWER SPECTRUM TO COSMOLOGICAL PARAMETERS 

Above, we discussed how to make a map of the CMB and estimate its noise properties. Elsewhere, the 
estimation of the CMB anisotropy power spectrum (Ce) from a map and its noise estimate have been described 
and implemented at least for data sets of moderate size. But the ultimate goal, of course, is to go from the 
power spectrum to the cosmological parameters. Various groups [4] have calculated how well future experiments 
are expected to perform; here, we describe what is necessary to begin to achieve such results. 

The basic problem is that a plot power spectrum estimates and associated errors do not suffice to describe 
the likelihood function of the experiment. Usually, we plot C = £(£ + l)Ce/(2w) ± oe and are tempted to do a 
y 2 procedure using those errors and possibly covarinaces between £ to fit theoretical curves to the estimates. 
As we have seen above, however, the shape of the likelihood as a function of the Ce is not Gaussian, implicitly 
assumed in the use of a y 2 procedure. 

One solution is simply to use a full likelihood-minimization procedure on the cosmological parameters given 
the entire dataset. That is, evaluate Eq. 14 directly for points throughout the cosmological parameter space 
(over which Ce and thereby the correlation matrix Ct will vary). Since all calculations of the likelihood 
function to date require O(N^) operations in general, this is prohibitive for experiments of even moderate 
size (especially if the likelihood must be searched over something like the 11-dimensional space appropriate for 
inflation-inspired models). 

Instead, we must find efficient ways to approximate the likelihood function based on minimal information. 
In the rest of this paper, we discuss approximations motivated by our knowledge of the likelihood function. 
Each requires only knowledge of the likelihood peak and curvature (or variance) as well as a third quantity 
related to the noise properties of the experiment. Alternately, for already-calculated likelihood functions, each 
approximation gives a functional form for fitting with a small number of parameters. 



A The solution: approximating the likelihood 

1 Offset lognormal distribution 

We already know enough about the likelihood to see a solution to this problem. For a given multipole £, 
there are two distinct regimes of likelihood. Consider a simple all-sky experiment with uniform pixel noise and 
some symmetric beam. Taking such a map as our data, we write T p = T p + n p which we transform to spherical 
harmonics, so the data has contributions from the signal, ae m , and the noise, ne m , and the likelihood is 

-2\nP(T\C t ) = ^(2£+l) 

i 

( up to an irrelevant additive constant; cf. Eq. 14), with Me = £(£+l)Ne/(2n), where Ne — (\ne m \ 2 ) is the noise 
power spectrum in spherical harmonics, and T>e = [£(£ + 1)/(2tt)] ^2 m \ae m \ 2 /(2£ + 1) is the power spectrum 
of the full data (noise plus beam-smoothed signal); we have written it as a different symbol from above to 
emphasize the inclusion of noise and again use script lettering to refer to quantities multiplied by £(£ + l)/(2n). 
Now, the likelihood is maximized at Ce = (T>e — Ne)/B 2 and the curvature about this maximum is given by 

so the error (defined by the variance) on a Ce is 



In (CeBj + Me 



CeB 2 +Me 



(36) 



6C e = (C e + Me/B 2 ) /^£ + T/2. 



(38) 
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Note that in this expression there is once again indication of a bias if we assume Gaussianity: upward 
fluctuations have larger uncertainty than downward fluctuations. But this is not true for Ze where Ze is 
defined so that SZ e oc 5Ce/ {Ce + Me/ Bj). More precisely, Z t = ln(C e + Ne/Bj). Since 5Zg is proportional to a 
constant, our approximation to the likelihood is to take Ze as normally distributed. That is, we approximate 

-21nP(T|C<) = z 2z i (m< z ))~* Z v (39) 

(up to a constant) where Nlffi = (C e +x e )M$ (C e < +x t >) where M< c ) is the covariance matrix of the Ce, usually 
taken to be the inverse of the curvature matrix. We refer to Eq. 39 as the offset lognormal distribution of Cg. 
Somewhat more generally we write 

Z e = \n(C e + x t ) (40) 

for some constant xe, which for the case in hand is xe — Me/Bf. 

It is illustrative to derive the quantity Ze in a somewhat more abstract fashion. We wish to find a change of 
variables from Ce to Ze such that the curvature matrix is a constant: 



0. (41) 



dZ e 

That is, we want to find a change of variables such that 

is not a function of Z. We immediately know of one such transformation which would seem to do the trick: 

^ (43) 



dZ L _ 1/2 



dCe 

where the 1/2 indicates a Cholesky decomposition or Hermitian square root. In general, this will be a horren- 
dously overdetermined set of equations, N 2 equations in N unknowns. However, we can solve this equation 
in general if we take the curvature matrix to be given everywhere by the diagonal form for the simplified 
experiment we have been discussing (Eq. 37). In this case, the equations decouple and lose their dependence 
on the data, becoming (up to a constant factor) 

g = (C,+^/ J B|)- 1 . (44) 

The solution to this differential equation is just what we expected, 

Z t = \D.{C t +Mi/Bl) (45) 

with correlation matrix 



V J i 



'LL' 

(C L +Af L /B 2 L )(C L ,+JV L ,/Bi,) 



(46) 



where £ — L and (! = L' . Please note that we are calculating a constant correlation matrix; the Ce in the 
denominator of this expression should be taken at the peak of the likelihood (i.e., the estimated quantities). 

We emphasize that, even for an all-sky continuously and uniformly sampled experiment (for which Eq. 37 
is exact), this Gaussian form, Eq. 39, is only an approximation, since the curvature matrix is given by Eq. 37 
only at the peak. Nonetheless we expect it to be a better approximation than a naive Gaussian in Ce (which 
we note is the limit xe — > oo of the offset lognormal) . 

We have also considered another ansatz known as the equal variance approximation which we have found 
sometimes reproduces the shape of the likelihood better than the offset lognormal form above, especially in 
the case of upper limits on Ce- 
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B Noise power 

In the above, we have derived the extra quantity needed in this ansatz, xe = Me, but this was explicitly done 
only for the unrealistic case of an experiment looking at the whole sky with uniform noise, with a likelihood 
function Eq. 36. Can we generalize this to more complicated experiments, which have the more complicated 
likelihood function Eq. 14, with correlation matrices encoding such effects as 

• nonuniform noise; 

• correlated noise; 

• incomplete sky coverage; and/or 

• complicated chopping schemes? 

Happily, the answer to this question is that we can, and have succesfully applied the results to experiments as 
diverse as COBE/DMR and Saskatoon. 

For a map with noise correlation matrix Cn, and signal correlation matrix Ct, we generalize our definition 
of x to 

, / Tr (C^Ct^C^Ct.b) ^ / Tr {C^Ct.bC^Ctm) _ , 

B/ B V T± {C^Ct,bC^C t ,b) ~ V TV(C-iC T , s C-iC T , B ) K ' 

where we use B to label bands of I rather than individual multipolcs, so that Ce has a fixed shape in each band 
with amplitude Cb and Ct,b = dCr /dCs is the derivative of the signal correlation matrix with respect to the 
power in the band. The second expression would be appropriate for a full-sky experiment where the signal and 
noise separate in the traces; it may be numerically easier to calculate due to zero or negative eigenvalues in 
Ct that often appear due to approximation. Other expressions that may be more appropriate for particular 
experimental strategies are given in Ref. [5] . 



C Tests and Results 

Because exact likelihoods have been calculated for many different experiments to date, we can check this 
ansatz in great detail. (In the future, this will probably be impossible because the exact calculation of the 
likelihood is so expensive that other techniques for finding the maximum will be necessary [6].) 



1 COBE/DMR 

First, we compare with the likelihood for COBE/DMR at several values of £. In Fig. 1 we show the actual 
likelihood for the COBE quadrupole and other multipoles, along with the Gaussian that would be assumed 
given the curvature matrix calculated from the data. The figures show another way of understanding the bias 
introduced by assuming Gaussianity: upward deviations from the mean (which is not actually the mean of the 
non-Gaussian distribution, but the mode) are overly disfavored by the Gaussian distributions while downward 
ones are overly probable. For example, the standard-CDM value of C2 = 770/xK 2 is only 0.2 times less likely 
than the most likely value of 150^iK 2 but it seems like a 5-sigma excursion (4 x ICP 6 times less likely) based 
on the curvature alone. 

Although it is extremely pronounced in the case of the quadrupole this is a problem that plagues all CMB 
data: the actual distribution is skewed to allow larger positive excursions than negative. The full likelihood 
"knows" about this and in fact takes it into account; however, if we compress the data to observed Ce ± ere 
(or even observed Ce and a correlation matrix Mee>) we lose this information about the shape of the likelihood 
function. Because of its relation to the well-known phenomenon of cosmic variance, we choose to call this 
problem one of cosmic bias. 

We emphasize that cosmic bias can be important even in high-S/N experiments with many pixels. We might 
expect the central limit theorem to hold in this case and the distributions to become Gaussian. Indeed they do, 
at least near the peak. However, the central limit theorem does not guarantee that the tails of the distribution 
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FIGURE 1. Full and approximate COBE/DMR likelihoods P(A|Cf) for various values of £, as marked. The horizontal 
axis is Ci — £(£ + l)Ce/(2ir). The upper right panel gives the cumulative probability. The solid (black) line is the full 
likelihood calculated exactly. The short-dashed (red) line is the Gaussian approximation about the peak. The dotted 
(cyan) line is a Gaussian in hiCe; the dashed (magenta) line is a Gaussian in In (Ci + xt), as discussed in the text. The 
dot-dashed (green) line is the equal- variance approximation. 
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will be Gaussian and there is the danger that a few seemingly discrepant points are given considerably more 
weight than they deserve. Cosmic bias has also been noted in previous work [7-9]. 

Putting the problem a bit more formally, we see that even in the limit of infinite signal-to-noise we cannot use 
a simple x 2 test on Ci estimates; such a test implicitly assumes a Gaussian likelihood. Unlike the distribution 
discussed here, a Gaussian would have constant curvature (5Ct = constant), rather than SCe oc Cg as illustrated 
here. 



2 Saskatoon 



We apply this ansatz to the Saskatoon experiment, perhaps the apotheosis of a complicated chopping ex- 
periment. The Saskatoon data are reported as complicated chopping patterns (i.e., beam patterns, H, above) 
in a disk of radius about 8° around the North Celestial Pole. The data were taken over 1993-1995 (although 
we only use the 1994-1995 data) at an angular resolution of 1.0-0.5° FWHM at approximately 30 GHz and 
40 GHz. More details can be found in Refs. [10,11]. The combination of the beam size, chopping pattern, and 
sky coverage mean that Saskatoon is sensitive to the power spectrum over the range I = 50-350. The Saskatoon 
dataset is calibrated by observations of supernova remnant, Cassiopeia-A. Leitch and collaborators [12] have 
recently measured the flux and find that the remnant is 5% brighter than the previous best determination. We 
have renormalized the Saskatoon data accordingly. 

We calculated Ci for this dataset in [7]. We combine these results with the data's noise matrix to calculate 
the appropriate correlation matrixes (in this case, the full curvature matrix) for Saskatoon and hence the 
appropriate xb (Eq. 47) and thus our approximations to the full likelihood. In Figure 2, we show the full 
likelihood, the naive Gaussian approximation, and our present offset lognormal and equal-variance forms. 
Again, both approximations reproduce the features of the likelihood function reasonably well, even into the 
tails of the distribution, certainly better than the Gaussian approximation. They seem to do considerably better 
in the higher- £ bands; even in the lower £ bands, however, the approximations result in a wider distribution 
which is preferable to the narrower Gaussian and its resultant strong bias. Moreover, we have found that 
we are able to reproduce the shape of the true likelihood essentially perfectly down to better than "three 
sigma" if we simply fit for the x b (but of course this can only be done when we have already calculated the 
full likelihood — precisely what we are trying to avoid!). For existing likelihood calculations, this method can 
provide better results without any new calculations (see [5] for our recommendations for the reporting of CMB 
bandpower results for extant, ongoing, and future experiments). 

We have found that the shape of the power spectrum used with each bin of I can have an impact on the 
likelihood function evaluated using this ansatz. Similarly, a finer binning in t will reproduce the full likelihood 
more accurately. Although the maximum-likelihood amplitude at a fixed shape (n s ) does not significantly 
depend on binning or shape, the shape of the likelihood function along the maximum-likelihood ridge changes 
with finer binning and with the assumed spectral shape. 



3 COBE/DMR + Saskatoon 

As a further example and test of these methods, we can combine the results from Saskatoon and COBE/DMR 
in order to determine cosmological parameters. For this example, we use the orthogonal linear combinations as 
described in the previous section. In Figure 3 we show the likelihood contours for standard CDM, varying the 
scalar slope n s and amplitude as- As before, we see that the naive x 2 procedure is biased toward low amplitudes 
at fixed shape (n s ), but that our new approximation recovers the peak quite well. The full likelihood gives 
a global maximum at (n s ,cr 8 ) = (1.15,1.67), and our approximation at (1.13,1.58), while the naive x 2 finds 
it at (1.21,1.55), outside even the three-sigma contours for the full likelihood. We can also marginalize over 
either parameter, in which case the full likelihood gives n s = 1.171^07! c 8 = 1.68j^2i> our ansatz gives 
n s = lAAtom, 08 = 1.60 ± 0.15; and the naive x 2 gives n s = 1.21±g;g|, er 8 = 1.55±g;^. (Note that even with 
the naive x 2 we marginalize by explicit integration, since the shape of the likelihood in parameter space is 
non-Gaussian in all cases.) 
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FIGURE 2. Full and approximate Saskatoon likelihoods. As in Fig. 1. The solid (black) line is the full likelihood 
calculated exactly. The short-dashed (red) line is the Gaussian approximation about the peak. The dotted (cyan) line 
is a Gaussian in lnC^; the dashed (magenta) line is a Gaussian in In (Ce + xe), as discussed in the text. The dot-dashed 
(green) line is the equal- variance approximation. 
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FIGURE 3. Likelihood contours for COBE/DMR and Saskatoon combined for the cosmological parameters n s and as 
(with otherwise standard CDM values) combining likelihoods from COBE/DMR and Saskatoon. Contours are for ratios 
of the likelihood to its maximum equal to exp —v 2 /2 with v = 1,2, 3. Upper panel is for the full likelihood (dashed) and 
its offset lognormal approximation as a Gaussian in In (Ce + xe) (solid; see text); lower panel shows the full likelihood 
and its approximation as a Gaussian in Ce. 
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4 Parameter Estimation 

Above, we have discussed many different approximations to the likelihood C. Here we discuss finding 
the parameters that maximize this likelihood (minimize the x 2 = — 21n£). We then apply our methods to 
estimating the power in discrete bins of £. This application provides another demonstration of the importance 
of using a better approximation to the likelihood than a Gaussian. 

The likelihood functions above depend on Ce which may in turn depend on other parameters, a p , which are, 
e.g., the physical parameters of a theory. If we write the parameters as a p + 8a p we can find the correction, 
Sa p , that minimizes x 2 by solving 



where 



Tpp ' ~ 2 da n da n , (49) 



is the curvature matrix for the parameters a p . If the \ 2 were quadratic (i.e., Gaussian C) then Eq. 48 would be 
exact. Otherwise, in most cases, near enough to its minimum, x 2 is approximately quadratic and an iterative 
application of Eq. 48 converges quite rapidly. The covariance matrix for the uncertainty in the parameters is 
given by {Sa p 5a p i) — T^,. This is just an approximation to the Newton- Raphson technique for finding the 
root of d£/da p = 0; a similar techniqe is used in quadratic estimation of Ce [13,7,9]. 

As our worked example here, we parameterize the power spectrum by the power in B = 1 to 11 bins, Cb- 
Within each of the bins, we assume Ce = Cb to be independent of I. We have chosen the offset lognormal 
approximation. The x 2 completely describes the model: 

x 2 = E ( z > - z t) m ?j ( z ) z t) + *c 2 al ; (50) 

a a 

Zt = \n{D l+Xi )- (52) 



In ^2 u u(€)fiBC B + x^j ; (53) 

M? = My (A + Xi) (Dj + xj) no sum; (54) 

where Mjj is the weight matrix for the band powers Di. We have modeled the signal contribution to the data, 
Di, as an average over the power spectrum, lisCs, times a calibration parameter, u a uy For simplicity, we 
take the prior probability distribution for this parameter to be normally distributed. Since the datasets have 
already been calibrated, the mean of this distribution is at u a = 1. The calibration parameter index, a is a 
function of i since different power spectrum constraints from the same dataset all share the same calibration 
uncertainty. We solve simultaneously for the u a and Cb] i-e., together they form the set of parameters, a p , in 
Eq. 48. For those experiments reported as band-powers together with the trace of the window function, W\, 
the filter is taken to be 

For Saskatoon and COBE/DMR, our Di are themselves estimates of the power in bands. For these cases the 
above equation applies, but with Wijl set to a constant within the estimated band and zero outside. The 
estimated bands have different I ranges than the target bands. 

Instead of the curvature matrix of Eq. 49 we use an approximation to it that ignores a logarithmic term. 
Including this term can cause the curvature matrix to be non-positive definite as the iteration proceeds. The 
approximation has no influence on our determination of the best fit power spectrum, but does affect the error 
bars. We expect that the effect is quite small. 
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We now proceed to find the best-fit power spectrum given different assumptions about the value of Xj, 
binnings of the power spectrum and editings of the data. 

We have determined the Xi only for COBE/DMR, Saskatoon, SP89, OVR07 and SuZIE. To test the sen- 
sitivity to the unknown X{S we found the minimum-^ 2 power spectrum assuming the two extremes of xi = 
(corresponding to lognormal) and Xi — oo (corresponding to Gaussian). These two power spectra are shown in 
Fig. 4. Note that both power spectra were derived using our measured Xi values; only the unknown Xi values 
were varied. The variation in the results would be much greater if we let these Xi values be at their extremes. 




FIGURE 4. Power spectra that minimize the \ 2 m Eq. 50. The solid (dashed) error bars assume x — (x — oo) for 
those datasets with no determination of x; the two sets have been offset slightly for display purposes. Solid curve is 
standard CDM. 



V CONCLUSIONS 

In this proceedings, we have discussed the beginning and end of the CMB data-analysis process, and shown 
how they can be unified in a Bayesian likelihood formalism. We have introduced new techniques and approxi- 
mations that 

• allow simultaneous calculation of the noise properties and underlying sky map. 

• approximate the full cosmological likelihood function with minimal information. 
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Crucially, both of these innovations can be calculated in relatively small or comparable amounts of time 
to the most computationally-challenging aspects of the problem. Hence, they await the same innovations 
necessary to perform these calculations in the first place for upcoming megapixel datasets [6]. 
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